Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

نویسندگان

  • Mikhail N. Dorojevets
  • D. Strukov
چکیده

In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithreading, in the proposed scalable NUMA sharedmemory architecture. The performance evaluation results for the conjugate gradient NAS benchmark demonstrate that the proposed fine-grain thread migration combined with data prefetching can be effectively used to reduce memory latency and switch traffic in NUMA sharedmemory multiprocessors with a large non-uniformity memory access ratio.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Runtime Thread Management for the Nano-Threads Programming Model

The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelism from standard applications. A prerequisite for the applicability of the nano-threads programming model is the ability of the runtime environment to manage parallelism at any level of granularity with minimal overheads. In thi...

متن کامل

Operating System Design and Implementation 1994. Experiences with Locking in a NUMA Multiprocessor Operating System Kernel

We describe the locking architecture of a new operating system, HURRICANE, designed for large scale shared-memory multiprocessors. Many papers already describe kernel locking techniques, and some of the techniques we use have been previously described by others. However, our work is novel in the particular combination of techniques used, as well as several of the individual techniques themselve...

متن کامل

Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory

Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granularity, and the coarse-grain approach based on virtual memory hardware that provides coherence at a p...

متن کامل

Experiences with Data Distribution on NUMA Shared Memory Multiprocessors

The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...

متن کامل

Operating System Design Principles for Scalable Shared Memory Multiprocessors

We describe SALSA an operating system that in corporates techniques for achieving scalability in large scale shared memory NUMA multiprocessors We evaluate the e ects of cache organization and caching policy on latency hiding via rapid thread switching With write back set associative caches we demon strate signi cant improvements in program perfor mance with latency hiding when cache miss laten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002